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ABSTRACT 

Presently days there are numerous android applications 
that mean to discover articles fulfilling both spatial 
predicate and a predicate on their related writings. In 
this paper, for discovering closest accommodation a 
basic arrangement is presented which depends on IR2 
tree [1].IR2 tree incorporates couple of insufficiencies 
that influence its productivity [2], To build the 
proficiency another technique called spatial reversed 
list [1] is presented that extends the standard modified 
record to address multidimensional data. This new SI 
file technique accompanies calculations which will 
answer closest neighbor questions with essential words 
progressively. As confirmed by trials, this anticipated 
system performs the IR2 tree in question time response 
essentially. Customary reflection inquiries, as change 
hunt and closest neighbor recovery, include singularly 
conditions on objects' geometric properties. Instantly 
the best determination to such inquiries is predicated on 
the IR2-tree, which, as indicated amid this paper, 
highlights a couple of inadequacies that truly effect its 
power. Incited by this, we have a tendency to build up a 
substitution access procedure known as the deliberation 
reversed file that extends the standard altered list to 
address level learning, and accompanies calculations 
that may answer closest neighbor questions with pivotal 
words progressively. As confirmed by examinations, 
the anticipated procedures outgo the IR2-tree being 
referred to inert period extensively, ordinarily by a 
component of requests of greatness. 

Keywords : Data Mining, Spatial Databases, Outliers, 
Keyword Search, Nearest Neighbor search 


I. Introduction: 

A spatial database is a database that store 
multidimensional questions, for example, focuses, 
rectangles, and so forth some spatial databases permit 
speaking to basic geometric protests, for example, lines, 
focuses and polygons. Some spatial databases handle 
more intricate structures, for example, 3D articles, 
topological coverage's, straight systems. In view of 
distinctive determination criteria spatial database gives 
quick access to multidimensional articles. In spatial 
database genuine elements are displayed in geometric 
way, for instance area of inns, doctor's facility, eateries 
are spoken to as focuses on maps, while bigger region, 
for example, scenes, lakes, parks are spoken to as a mix 
of rectangles. Spatial database framework can utilized 
as a part of geographic data framework, in this extent 
hunt can be used to discover all eateries in a certain 
range, while closest neighbor recovery can locate the 
eatery more like a given location. Questions in spatial 
database have turn out to be progressively imperative as 
of late with the expanding prominence of a few 
administrations, for example, Google Earth and Yahoo 
Maps, and other geographic applications. Today, 
broadly utilized of internet searchers has made it 
practical to compose spatial inquiries in another way. 
Generally, questions concentrate on items just 
geometric properties, for instance whether a point is in 
rectangle or how two focuses are close from one 
another. Some new application permits clients to search 
articles taking into account both of their geometric 
directions and their related writings. Such kind of 
inquiries called as spatial magic word inquiry. Case in 
point, if a web index can be utilized to discover closest 
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inn that offer offices, for example, pool and web in the 
meantime. From this inquiry, we could first acquire the 
whole lodging whose administrations contain the 
arrangement of decisive words, and after that locate the 
closest one from the recovered eatery. The real 
disadvantage of this methodology is that, on the 
troublesome data they don't give continuous answer. 
Case in point, from the question point the genuine 
neighbor lies far away, while all the closer neighbors 
are lost no less than one of the inquiry magic words. 
Spatial pivotal word questions have not been broadly 
investigated. In the previous years, the gathering of 
individuals has demonstrated enthusiasm for 
considering pivotal word seek in social databases. As of 
late the consideration has engrossed to 
multidimensional information [5] [6], The best system 
for closest neighbor look with decisive words is a result 
of Felipe et al. [5]. They join the spatial file R-tree [7] 
and mark record [8], So they added to a structure called 
IR2 - tree. This tree has the capacity of both R-tree and 
mark records. Like R-tree it stores the spatial vicinity of 
item and like mark document it channels those articles 
that do exclude all inquiry pivotal words. 

For substance comparability pursuits taking into 
account highlight vectors, characterizing the separation 
for speaking to the size of likeness is vital. Euclidean 
separation is an ordinary measure of separation, yet 
there are a few issues with this separation as takes after: 
(1) Euclidean separation is to a great degree delicate to 
the sizes of the component qualities, and (2) Euclidean 
separation is incognizant in regards to connected 
elements. In this paper, we propose a quick 
multidimensional closest neighbor look calculation in 
view of ellipsoid separation. Ellipsoid separation 
considers the connection among elements in 
ascertaining separation. By utilizing ellipsoid 
separation, the issues of scale and connection innate in 
Euclidean separation are no more an issue. With the 
calculation proposed in this paper, proficient end of 
pointless number-crunching operations has been 
accomplished by changing over the computation of 
ellipsoid separation to figuring of Euclidean separation 
through a spatial change performed utilizing Cholesky 
deterioration. 

II. Related Work: 

In the paper 'quick closest neighbor seek with 
watchwords', there are systems like spatial file, altered 
list, closest neighbor look. The principal strategy spatial 
file is utilized for making records in light of the fact 
that there is immense measure of information should be 


put away for looking that information put away as xml 
archives. In the event that the information stockpiling 
made as records then space needed is less additionally 
time required for looking the decisive word is less. 
Second technique is reversed list. The rearranged file 
information structure is a focal part of a run of the mill 
web index indexing calculation. An objective of an 
internet searcher execution is to advance the rate of the 
question: discover the archives where word happens. 
When a file is produced, which procurements 
arrangements of words per archive; it is next rearranged 
to build up an upset record. Questioning the list would 
require consecutive cycle through every report and to 
every word to check a coordinating record. The time 
memory and preparing property to execute such an 
inquiry are not generally hypothetically reasonable. As 
opposed to posting the words per article in the record, 
the transformed list information structure is created 
which records the archives per word. The rearranged 
record delivered, the question can now be dictated by 
bouncing to the word id in the altered list. These were 
successfully transformed lists with a little measure of 
supplementary clarification that obliged an impossible 
measure of endeavor to deliver. Third strategy is closest 
neighbor look. Closest neighbor seek (NNS), 
additionally recognized as closeness inquiry, parallel 
quest is an advancement issue for discovering nearest 
focuses in metric spaces. In the paper 'Productive 
Keyword-Based Search for Top-K Cells in Text Cube' 
strategies utilized are rearranged record one-sweep, 
archive sorted-output, base up element programming, 
and pursuit space requesting. In the top k cells, there is 
a seeking of closest key to the question. Blocks 
structures bunches of single interesting gathering which 
demonstrates its personality. Strategy like reversed 
record utilized for giving list as opposed to giving 
entire information which can be space devouring. 

A spatial database oversees multidimensional articles, 
(for example, focuses, rectangles, and so forth.), and 
gives quick access to those items taking into account 
diverse choice criteria. The significance of spatial 
databases is reflected by the comfort of demonstrating 
elements of reality in a geometric way [3], Case in 
point, areas of eateries, inns, healing centers thus on are 
regularly spoken to as focuses in a guide, while bigger 
degrees, for example, stops, lakes, and scenes 
frequently as a mix of rectangles [4], Numerous 
functionalities of a spatial database are valuable in 
different routes in particular connections [5]. 
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III. Nearest neighbor search of multidimensional 
data: 

A nearest neighbor search in a multidimensional space 
is the problem of finding the nearest vector to a given 
vector (query vector) q among N data vectors 
(candidate vectors) xi(i = 1, 2, . . . , N) placed in n- 
dimentional space. There are two typical varieties of 
nearest neighbor search: (i) k-nearest neighbor search 
(search restricted by number) The search attempts to 
find the k vectors closest to the given query vector q, 
(ii) e-nearest neighbor search (search restricted by 
range) The search attempts to find vectors within a 
distance s from the given query vector q; that is, vector 
xi satisfying d(q, xi) are found. In a linear search 
wherein a given vector is compared sequentially to all 
vectors in a database, the computational complexity 
increases in direct proportion to the database size. 
Therefore, the development of multidimensional 
indexing techniques for efficient nearest neighbor 
search has been attracting much attention recently. 5 
There are various algorithms for multidimensional 
indexing in a Euclidean space, such as R-tree,6 R+- 
tree,7 R*-tree,8 SS-tree,9 SS+-tree,10 CSS+-tree,ll X- 
tree,12 and SR-tree,13 as well as more general indexing 
methods for metric spaces, for example, VP-tree,14 
MVP-tree,15 M-tree,16 etc. Such indexing techniques 
are based on restriction of the search range by 
hierarchical partitioning of multidimensional search 
space, and they limit the scope of the basic search, 
we use the VP-tree as the object of comparison with the 
proposed method. The following is a brief overview of 
the VP-tree. The VP-tree is a method for 
multidimensional indexing of typical distance space. It 
aims to shrink the amount of space explored in the 
search by recursively partitioning multidimensional 
space, based on the distance between data points. The 
VP-tree uses a reference point known as a vantage 
point, and it has the special characteristic of not 
allowing a common area to arise in the partitioned 
space so that hyperspheres can be used to partition 
space in a top-down manner. By contrast, the M-tree, 
which partitions space in a bottom-up manner, has a 
drawback in that there are many common areas 
between the partitioned spaces, with the result that 
search efficiency declines. VP-tree index building can 
be summarized as follows. A vantage point (hereinafter 
referred to as vp) is selected for dataset s consisting of 
N number of data points by means of the random 
algorithms described below, (i) Select temporary vp 
randomly from the data set, (ii) Calculate the distance 
to the reset of the N -1 objects from the temporary, (iii) 
Calculate the intermediate value and distribution of 
these distances, (iv) The point of maximum 


distribution, obtained by repeating performing (i) - (iii) 
above, is designated vp. The intermediate value for the 
distance of all data in the data set S from the vp chosen 
as the root node is p. When d(p, q) is established as the 
distance between points p, q, data set S is partitioned 
into SI and S2 as follows: SI = {s eS|d(s, vq) < p} S2 
= {s GS|d(s, vq) > p} (1) In like manner, this 
partitioning operation is recursively applied to SI and 
S2 to create the index. The VP-tree index is represented 
by a tree structure, and subsets such as the above- 
mentioned SI and S2 each correspond to one node of 
the tree. In addition, each leaf node stores a number of 
data points. The search starts from the route nodes and 
follows the nodes conforming to the search scope, 
accesses data stored in the leaf node that it finally 
arrives at point by point, calculates the distance, and 
determines whether or not it conforms to the search 
scope. 

Problems with multidimensional indexing technology 
in high dimensions: Content searches of images and 
other multimedia content employ multidimensional 
feature vectors that may exceed 100 dimensions. 
Phenomena of the kind that cannot even be imaged in 
two-dimensional or three-dimensional space are known 
to occur in such high-dimensional space. Because the 
degree of spatial freedom is extremely high in higher¬ 
dimensional space, solving various problems in 
computational geometry and multivariate analysis 
involves an enormous amount of calculation and is 
hence notoriously difficult. These difficulties are 
collectively referred to as the “curse of dimensionality.” 
In nearest neighbor searches in high-dimensional space, 
a phenomenon occurs whereby the search becomes 
more and more difficult as the dimensionality becomes 
higher. For example, when points are uniformly 
distributed in n-dimensional space, the ratio of the 
distance of the k-th nearest and the (k + l)-th nearest 
point to a given point can be approximated by the 
following formula: 17 E{d(k+1)NN } E{dkNN } ~ 1 + 
1 kn (2) As you can see from the above, as n becomes 
larger, the ratio of the distance of the k-th nearest point 
and the (k+ l)-th nearest point asymptotically 
approaches 1. Moreover, when the points are uniformly 
distributed, the ratio of the distance to the nearest point 
to the distance to the most distant point asymptotically 
approaches 1 as the dimensionality becomes higher. 
Therefore, methods for dividing the space 
hierarchically entail problems in that the difference due 
to distance is small, making it impossible to limit the 
area explored, and an amount of calculation that 
approaches that of a linear search is required. 
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IV. PROPOSED SCHEMA: 

A spatial info manages dimensional objects (such as 
points, rectangles, etc.), and provides quick access to 
those objects supported totally different choice criteria. 
The importance of spatial databases is mirrored by the 
convenience of modelling entities of reality in an 
exceedingly geometric manner, for instance, locations 
of restaurants, hotels, hospitals so on square measure 
typically described as points in an exceedingly map, 
whereas larger extents like parks, lakes, and landscapes 
typically as a mix of rectangles, several functionalities 
of a spatial info square measure helpful in varied ways 
in which in specific contexts, as an example, in an 
exceedingly geographies system, vary search will be 
deployed to search out all restaurants in an exceedingly 
sure space, whereas nearest neighbor retrieval will 
discover the eating place nearest to a given address. 
Furthermore, because the Si-index relies on the 
traditional technology of inverted index, it's without 
delay incorporable in an exceedingly business computer 
programme that applies large similarity, implying its 
immediate industrial deserves. 

V. NEAREST NEIGHBOR SEARCH 
TECHNIQUE: 

A. IR-Tree, Approximation algorithm and exact 
algorithm: 

This method is used to retrieve a group of spatial web 
objects such that the query’s keywords are cover by 
group’s keywords and objects are near to the query 
location and have the lowest inter object distances. This 
method addresses the two instantiation of the group 
keyword query. First is to find the group of objects that 
cover the keywords such that the sum of their distances 
to the query is minimized. Second is to find a group of 
objects that cover the keywords such that sum of the 
maximum distance among an object in group of objects 
and query and maximum distance among two objects in 
group of objects is minimized. Both of these sub 
problems are NP-complete. Greedy algorithm is used to 
provide an approximation solution to the problem that 
utilizes the spatial keyword index IR-tree to reduce the 
search space. But in some application query does not 
contain a large number of keywords, for this exact 
algorithm is used that uses the dynamic programming. 
[ 1 ] 

B. IUR-tree (Intersection union R-tree) 

Geographic objects associated with descriptive texts are 
becoming common. This gives importance to spatial 


keyword queries that take both the location and text 
description of content. This technique is used to 
analyze the problem of reverse spatial and textual k 
nearest neighbor search i.e finding objects that takes the 
query object as one of their spatial textual similar 
objects. For this type of search hybrid index structure is 
used that successfully merge the location proximity 
with textual similarity. For searching, branch and 
bound algorithm is used. In addition to increase the 
speed of query processing a variant of IURtree and two 
optimization algorithm is used. To enhance the IUR¬ 
tree text clustering is used, in this objects of all the data 
base is group into clusters according to their text 
similarity. Each node of the tree is extended by the 
cluster information to create a hybrid tree which is 
called as cluster IUR-tree. To enhance the search 
performance of this tree two optimization methods is 
used, first is based on outlier detection and extraction 
and second method is based on text entropy. [2] 

C. BR* -tree: 

This hybrid index structure is used to search m-closest 
keywords. This technique finds the closest tuples that 
matches the keywords provided by the user. This 
structure combines the R* -tree and bitmap indexing to 
process the mclosest keyword query that returns the 
spatially closest objects matching m keywords To 
reduce the search space a priori based search strategy is 
used. Two monotone constraints is used as a priori 
properties to facilitates efficient pruning which is called 
as distance mutex and keyword mutex. But this 
approach is not suitable for handling ranking queries 
and in this number of false hits is large.[3] 

D. IR2 -tree: 

The growing number of applications requires the 
efficient execution of nearest neighbor queries which is 
constrained by the properties of spatial objects. 
Keyword search is very popular on the internet so these 
applications allow users to give list of keywords that 
spatial objects should contain. Such queries called as a 
spatial keyword query. This is consisted of query area 
and set of keywords. The IR2 -tree is developed by the 
combination of R-tree and signature files, where each 
node of tree has spatial and keyword information. This 
method is efficiently answering the top-k spatial 
keyword queries. In this signature is added to the every 
node of the tree. An able algorithm is used to answer 
the queries using the tree. Incremental nearest 
algorithm is used for the tree traversal and if root node 
signature does not match the query signature then it 
prunes the whole subtrees. But IR2 -tree has some 
drawbacks such as false hits where the object of final 
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result is far away from the query or this is not suitable 
for handling ranking queries.[4] 

E. Spatial inverted index and Minimum bounding 
method: 

So, new access method spatial inverted access method 
is used to remove the drawbacks of previous methods 
such as false hits. This method is the variant of inverted 
index using for multidimensional points. This index 
stores the spatial region of data points and on every 
inverted list Rtree is built. Minimum bounding method 
is used for traversing the tree to prune the search space. 

VI. Processing Spatial-Keyword (SK) Queries in 
Geographic Information Retrieval (GIR) 
Systems: 

Location based information stored in GIS database. 
These information entities of such databases have both 
spatial and textual descriptions. This paper proposes a 
framework for GIR system and focus on indexing 
strategies that can process spatial keyword query. The 
following contributions in this paper: 1) It gives 
framework for query processing in Geo- graphic 
Information Retrieval (GIR) Systems. 2) Develop a 
novel indexing structure called KR*-tree that captures 
the joint distribution of keywords in space and 
significantly improves performance over existing index 
structures. 3) This method have conducted experiments 
on real GIS datasets showing the effectiveness of our 
techniques compared to the existing solutions. It 
introduces two index structures to store spatial and 
textual information. 

A) Separate index for spatial and text attributes: 

Advantages: - 

1. Easy of maintaining two separate indices. 

2. Performance bottleneck lies in the number of 
candidate object generated during the filtering 
stage. 

Disadvantages: - 

1. If spatial filtering is done first, many objects may 
lie within a query is spatial extent, but very few of 
them are relevant to query keywords. This increases 
the disk access cost by generating a large number of 
candidate objects. The subsequent stage of keyword 
filtering becomes expensive. 

B) Hybrid index 

Advantages and limitations: - 

1. When query contains keywords that closely 
correlated in space, this approach suffer from 
paying extra disk cost accessing R*-tree and high 
overhead in subsequent merging process. 


VII. CONCLUSION 

In this paper, we proposed a new technique for efficient 
nearest neighbor search in a set of high-dimensional 
points. Our technique is based on the pre-computation 
of the solution space of any arbitrary nearest-neighbor 
search. This corresponds to the computation of the 
voronoi cells of the data points. Since voronoi cells 
may become rather complex when going to higher 
dimensions, we presented a new algorithm for the 
approximation of high-dimensional voronoi cells using 
a set of minimum bounding (hyper-) rectangles. 
Although our technique is based on a pre-computation 
of the solution space, it is dynamic, i.e. it supports 
insertions of new data points. We finally showed in an 
experimental evaluation that our technique is efficient 
for various kinds of data and clearly outperforms the 
state of the art nearest-neighbor algorithms.We have 
seen plenty of applications calling for a search engine 
that is able to efficiently support novel forms of spatial 
queries that are integrated with keyword search. The 
existing solutions to such queries either incur 
prohibitive space consumption or are unable to give 
real time answers. In this paper, we have remedied the 
situation by developing an access method called the 
spatial inverted index (Si-index). Not only that the 
Slindex is fairly space economical, but also it has the 
ability to perform keyword-augmented nearest neighbor 
search in time that is at the order of dozens of 
milliseconds. Furthermore, as the Si-index is based on 
the conventional technology of inverted index, it is 
readily incorporate in a commercial search engine that 
applies massive parallelism, implying its immediate 
industrial merits. 
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